Visualizations, the grammar of graphics, and ggplot2

Benjamin Soltoff University of Chicago

ID \(N\) \(\bar{X}\) \(\bar{Y}\) \(\sigma_{X}\) \(\sigma_{Y}\) \(R\)
1 142 54.26610 47.83472 16.76983 26.93974 -0.0641284
2 142 54.26873 47.83082 16.76924 26.93573 -0.0685864
3 142 54.26732 47.83772 16.76001 26.93004 -0.0683434
4 142 54.26327 47.83225 16.76514 26.93540 -0.0644719
5 142 54.26030 47.83983 16.76774 26.93019 -0.0603414
6 142 54.26144 47.83025 16.76590 26.93988 -0.0617148
7 142 54.26881 47.83545 16.76670 26.94000 -0.0685042
8 142 54.26785 47.83590 16.76676 26.93610 -0.0689797
9 142 54.26588 47.83150 16.76885 26.93861 -0.0686092
10 142 54.26734 47.83955 16.76896 26.93027 -0.0629611
11 142 54.26993 47.83699 16.76996 26.93768 -0.0694456
12 142 54.26692 47.83160 16.77000 26.93790 -0.0665752
13 142 54.26015 47.83972 16.76996 26.93000 -0.0655833

Grammar

The whole system and structure of a language or of languages in general, usually taken as consisting of syntax and morphology (including inflections) and sometimes also phonology and semantics.

Grammar of graphics

  • “The fundamental principles or rules of an art or science”
  • A grammar used to describe and create a wide range of statistical graphics
  • Layered grammar of graphics
    • ggplot2

Carte figurative des pertes successives en hommes de l’Armee Français dans la campagne de Russe 1812–1813 by Charles Joseph Minard

Building Minard’s map in R

troops
## # A tibble: 51 x 5
##     long   lat survivors direction group
##    <dbl> <dbl>     <int> <chr>     <int>
##  1  24    54.9    340000 A             1
##  2  24.5  55      340000 A             1
##  3  25.5  54.5    340000 A             1
##  4  26    54.7    320000 A             1
##  5  27    54.8    300000 A             1
##  6  28    54.9    280000 A             1
##  7  28.5  55      240000 A             1
##  8  29    55.1    210000 A             1
##  9  30    55.2    180000 A             1
## 10  30.3  55.3    175000 A             1
## # ... with 41 more rows
cities
## # A tibble: 20 x 3
##     long   lat city          
##    <dbl> <dbl> <chr>         
##  1  24    55   Kowno         
##  2  25.3  54.7 Wilna         
##  3  26.4  54.4 Smorgoni      
##  4  26.8  54.3 Moiodexno     
##  5  27.7  55.2 Gloubokoe     
##  6  27.6  53.9 Minsk         
##  7  28.5  54.3 Studienska    
##  8  28.7  55.5 Polotzk       
##  9  29.2  54.4 Bobr          
## 10  30.2  55.3 Witebsk       
## 11  30.4  54.5 Orscha        
## 12  30.4  53.9 Mohilow       
## 13  32    54.8 Smolensk      
## 14  33.2  54.9 Dorogobouge   
## 15  34.3  55.2 Wixma         
## 16  34.4  55.5 Chjat         
## 17  36    55.5 Mojaisk       
## 18  37.6  55.8 Moscou        
## 19  36.6  55.3 Tarantino     
## 20  36.5  55   Malo-Jarosewii

Minard’s grammar

  • Troops
    • Latitude
    • Longitude
    • Survivors
    • Advance/retreat
  • Cities
    • Latitude
    • Longitude
    • City name

plot_troops <- ggplot(data = troops,
                      mapping = aes(x = long, y = lat)) +
  geom_path(aes(size = survivors,
                color = direction,
                group = group))
plot_troops

plot_both <- plot_troops + 
  geom_text(data = cities, mapping = aes(label = city), size = 4)
plot_both

plot_polished <- plot_both +
  scale_size(range = c(0, 12),
             breaks = c(10000, 20000, 30000),
             labels = c("10,000", "20,000", "30,000")) + 
  scale_color_manual(values = c("tan", "grey50")) +
  coord_map() +
  labs(title = "Map of Napoleon's Russian campaign of 1812",
       x = NULL,
       y = NULL)
plot_polished

plot_polished +
  theme_void() +
  theme(legend.position = "none")